Overview

Dataset statistics

Number of variables 12
Number of observations 304
Missing cells 0
Missing cells (%) 0.0%
Duplicate rows 0
Duplicate rows (%) 0.0%
Total size in memory 28.6 KiB
Average record size in memory 96.4 B

Variable types

Categorical 7
Numeric 5

Alerts

Tectonic_regime has a high cardinality: 56 distinct values High cardinality
Gross is highly overall correlated with Tectonic_regime and 2 other fields High correlation
Netpay is highly overall correlated with Tectonic_regime and 4 other fields High correlation
Porosity is highly overall correlated with Structural_setting and 2 other fields High correlation
Permeability is highly overall correlated with Porosity High correlation
Tectonic_regime is highly overall correlated with Onshore/Offshore and 7 other fields High correlation
Onshore/Offshore is highly overall correlated with Tectonic_regime and 2 other fields High correlation
Structural_setting is highly overall correlated with Tectonic_regime and 7 other fields High correlation
Hydrocarbon_type is highly overall correlated with Tectonic_regime and 1 other fields High correlation
Depth is highly overall correlated with Tectonic_regime and 1 other fields High correlation
Period is highly overall correlated with Tectonic_regime and 4 other fields High correlation
Lithology is highly overall correlated with Tectonic_regime and 3 other fields High correlation

Reproduction

Analysis started 2022-12-10 17:08:15.211978
Analysis finished 2022-12-10 17:08:28.940707
Duration 13.73 seconds
Software version pandas-profiling vv3.5.0
Download configuration config.json

Variables

Tectonic_regime
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct 56
Distinct (%) 18.4%
Missing 0
Missing (%) 0.0%
Memory size 2.5 KiB
COMPRESSION
58 
EXTENSION
32 
COMPRESSION/EROSION
25 
INVERSION/COMPRESSION/EXTENSION
24 
COMPRESSION/EVAPORITE
23 
Other values (51)
142 

Length

Max length 56
Median length 48
Mean length 24.855263
Min length 9

Characters and Unicode

Total characters 7556
Distinct characters 24
Distinct categories 3 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 23 ?
Unique (%) 7.6%

Sample

1st row STRIKE-SLIP/TRANSPRESSION/BASEMENT-I
2nd row GRAVITY/EXTENSION/EVAPORITE
3rd row GRAVITY/EXTENSION/EVAPORITE
4th row COMPRESSION
5th row INVERSION/COMPRESSION/EXTENSION

Common Values

Value Count Frequency (%)
COMPRESSION 58
19.1%
EXTENSION 32
 
10.5%
COMPRESSION/EROSION 25
 
8.2%
INVERSION/COMPRESSION/EXTENSION 24
 
7.9%
COMPRESSION/EVAPORITE 23
 
7.6%
GRAVITY/EXTENSION/EVAPORITE/SYNSEDIMENTATION 12
 
3.9%
EXTENSION/EROSION 11
 
3.6%
GRAVITY/EXTENSION/EVAPORITE 9
 
3.0%
INVERSION/COMPRESSION/EXTENSION/EVAPORITE 7
 
2.3%
INVERSION/COMPRESSION/EXTENSION/EROSION 6
 
2.0%
Other values (46) 97
31.9%

Length

2022-12-10T23:08:29.190123 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
compression 58
19.1%
extension 32
 
10.5%
compression/erosion 25
 
8.2%
inversion/compression/extension 24
 
7.9%
compression/evaporite 23
 
7.6%
gravity/extension/evaporite/synsedimentation 12
 
3.9%
extension/erosion 11
 
3.6%
gravity/extension/evaporite 9
 
3.0%
inversion/compression/extension/evaporite 7
 
2.3%
inversion/compression/extension/erosion 6
 
2.0%
Other values (46) 97
31.9%

Most occurring characters

Value Count Frequency (%)
E 998
13.2%
O 853
11.3%
S 841
11.1%
I 839
11.1%
N 838
11.1%
R 521
6.9%
/ 452
 
6.0%
T 443
 
5.9%
P 337
 
4.5%
A 248
 
3.3%
Other values (14) 1186
15.7%

Most occurring categories

Value Count Frequency (%)
Uppercase Letter 7062
93.5%
Other Punctuation 452
 
6.0%
Dash Punctuation 42
 
0.6%

Most frequent character per category

Uppercase Letter
Value Count Frequency (%)
E 998
14.1%
O 853
12.1%
S 841
11.9%
I 839
11.9%
N 838
11.9%
R 521
7.4%
T 443
6.3%
P 337
 
4.8%
A 248
 
3.5%
M 235
 
3.3%
Other values (12) 909
12.9%
Other Punctuation
Value Count Frequency (%)
/ 452
100.0%
Dash Punctuation
Value Count Frequency (%)
- 42
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 7062
93.5%
Common 494
 
6.5%

Most frequent character per script

Latin
Value Count Frequency (%)
E 998
14.1%
O 853
12.1%
S 841
11.9%
I 839
11.9%
N 838
11.9%
R 521
7.4%
T 443
6.3%
P 337
 
4.8%
A 248
 
3.5%
M 235
 
3.3%
Other values (12) 909
12.9%
Common
Value Count Frequency (%)
/ 452
91.5%
- 42
 
8.5%

Most occurring blocks

Value Count Frequency (%)
ASCII 7556
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
E 998
13.2%
O 853
11.3%
S 841
11.1%
I 839
11.1%
N 838
11.1%
R 521
6.9%
/ 452
 
6.0%
T 443
 
5.9%
P 337
 
4.5%
A 248
 
3.3%
Other values (14) 1186
15.7%

Onshore/Offshore
Categorical

Distinct 2
Distinct (%) 0.7%
Missing 0
Missing (%) 0.0%
Memory size 2.5 KiB
1
211 
0
93 

Length

Max length 1
Median length 1
Mean length 1
Min length 1

Characters and Unicode

Total characters 304
Distinct characters 2
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0
2nd row 0
3rd row 1
4th row 1
5th row 1

Common Values

Value Count Frequency (%)
1 211
69.4%
0 93
30.6%

Length

2022-12-10T23:08:29.423720 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-10T23:08:29.677854 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Value Count Frequency (%)
1 211
69.4%
0 93
30.6%

Most occurring characters

Value Count Frequency (%)
1 211
69.4%
0 93
30.6%

Most occurring categories

Value Count Frequency (%)
Decimal Number 304
100.0%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
1 211
69.4%
0 93
30.6%

Most occurring scripts

Value Count Frequency (%)
Common 304
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
1 211
69.4%
0 93
30.6%

Most occurring blocks

Value Count Frequency (%)
ASCII 304
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
1 211
69.4%
0 93
30.6%

Hydrocarbon_type
Categorical

Distinct 5
Distinct (%) 1.6%
Missing 0
Missing (%) 0.0%
Memory size 2.5 KiB
OIL
230 
GAS
47 
GAS-CONDENSATE
25 
METHANE HYDRATE
 
1
CARBON DIOXIDE
 
1

Length

Max length 15
Median length 3
Mean length 3.9802632
Min length 3

Characters and Unicode

Total characters 1210
Distinct characters 19
Distinct categories 3 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 2 ?
Unique (%) 0.7%

Sample

1st row OIL
2nd row OIL
3rd row OIL
4th row OIL
5th row OIL

Common Values

Value Count Frequency (%)
OIL 230
75.7%
GAS 47
 
15.5%
GAS-CONDENSATE 25
 
8.2%
METHANE HYDRATE 1
 
0.3%
CARBON DIOXIDE 1
 
0.3%

Length

2022-12-10T23:08:29.901126 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-10T23:08:30.156005 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Value Count Frequency (%)
oil 230
75.2%
gas 47
 
15.4%
gas-condensate 25
 
8.2%
methane 1
 
0.3%
hydrate 1
 
0.3%
carbon 1
 
0.3%
dioxide 1
 
0.3%

Most occurring characters

Value Count Frequency (%)
O 257
21.2%
I 232
19.2%
L 230
19.0%
A 100
 
8.3%
S 97
 
8.0%
G 72
 
6.0%
E 54
 
4.5%
N 52
 
4.3%
D 28
 
2.3%
T 27
 
2.2%
Other values (9) 61
 
5.0%

Most occurring categories

Value Count Frequency (%)
Uppercase Letter 1183
97.8%
Dash Punctuation 25
 
2.1%
Space Separator 2
 
0.2%

Most frequent character per category

Uppercase Letter
Value Count Frequency (%)
O 257
21.7%
I 232
19.6%
L 230
19.4%
A 100
 
8.5%
S 97
 
8.2%
G 72
 
6.1%
E 54
 
4.6%
N 52
 
4.4%
D 28
 
2.4%
T 27
 
2.3%
Other values (7) 34
 
2.9%
Dash Punctuation
Value Count Frequency (%)
- 25
100.0%
Space Separator
Value Count Frequency (%)
2
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 1183
97.8%
Common 27
 
2.2%

Most frequent character per script

Latin
Value Count Frequency (%)
O 257
21.7%
I 232
19.6%
L 230
19.4%
A 100
 
8.5%
S 97
 
8.2%
G 72
 
6.1%
E 54
 
4.6%
N 52
 
4.4%
D 28
 
2.4%
T 27
 
2.3%
Other values (7) 34
 
2.9%
Common
Value Count Frequency (%)
- 25
92.6%
2
 
7.4%

Most occurring blocks

Value Count Frequency (%)
ASCII 1210
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
O 257
21.2%
I 232
19.2%
L 230
19.0%
A 100
 
8.3%
S 97
 
8.0%
G 72
 
6.0%
E 54
 
4.5%
N 52
 
4.3%
D 28
 
2.3%
T 27
 
2.2%
Other values (9) 61
 
5.0%

Reservoir_status
Categorical

Distinct 12
Distinct (%) 3.9%
Missing 0
Missing (%) 0.0%
Memory size 2.5 KiB
DECLINING PRODUCTION
92 
MATURE PRODUCTION
54 
NEARLY DEPLETED
49 
PLATEAU PRODUCTION
32 
DEVELOPING
21 
Other values (7)
56 

Length

Max length 24
Median length 20
Mean length 16.407895
Min length 7

Characters and Unicode

Total characters 4988
Distinct characters 22
Distinct categories 2 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 1 ?
Unique (%) 0.3%

Sample

1st row DEVELOPING
2nd row MATURE PRODUCTION
3rd row MATURE PRODUCTION
4th row DECLINING PRODUCTION
5th row DECLINING PRODUCTION

Common Values

Value Count Frequency (%)
DECLINING PRODUCTION 92
30.3%
MATURE PRODUCTION 54
17.8%
NEARLY DEPLETED 49
16.1%
PLATEAU PRODUCTION 32
 
10.5%
DEVELOPING 21
 
6.9%
REJUVENATING 21
 
6.9%
UNKNOWN 12
 
3.9%
UNDEVELOPED 7
 
2.3%
CONTINUING DEVELOPMENT 6
 
2.0%
SECOND PLATEAU PRODUTION 5
 
1.6%
Other values (2) 5
 
1.6%

Length

2022-12-10T23:08:30.571650 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
production 178
32.5%
declining 92
16.8%
mature 54
 
9.9%
depleted 50
 
9.1%
nearly 49
 
9.0%
plateau 37
 
6.8%
developing 21
 
3.8%
rejuvenating 21
 
3.8%
unknown 12
 
2.2%
undeveloped 7
 
1.3%
Other values (5) 26
 
4.8%

Most occurring characters

Value Count Frequency (%)
N 559
11.2%
E 514
10.3%
D 429
8.6%
O 427
 
8.6%
I 421
 
8.4%
T 357
 
7.2%
U 320
 
6.4%
R 307
 
6.2%
P 304
 
6.1%
C 281
 
5.6%
Other values (12) 1069
21.4%

Most occurring categories

Value Count Frequency (%)
Uppercase Letter 4745
95.1%
Space Separator 243
 
4.9%

Most frequent character per category

Uppercase Letter
Value Count Frequency (%)
N 559
11.8%
E 514
10.8%
D 429
9.0%
O 427
9.0%
I 421
8.9%
T 357
7.5%
U 320
 
6.7%
R 307
 
6.5%
P 304
 
6.4%
C 281
 
5.9%
Other values (11) 826
17.4%
Space Separator
Value Count Frequency (%)
243
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 4745
95.1%
Common 243
 
4.9%

Most frequent character per script

Latin
Value Count Frequency (%)
N 559
11.8%
E 514
10.8%
D 429
9.0%
O 427
9.0%
I 421
8.9%
T 357
7.5%
U 320
 
6.7%
R 307
 
6.5%
P 304
 
6.4%
C 281
 
5.9%
Other values (11) 826
17.4%
Common
Value Count Frequency (%)
243
100.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 4988
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
N 559
11.2%
E 514
10.3%
D 429
8.6%
O 427
 
8.6%
I 421
 
8.4%
T 357
 
7.2%
U 320
 
6.4%
R 307
 
6.2%
P 304
 
6.1%
C 281
 
5.6%
Other values (12) 1069
21.4%

Structural_setting
Categorical

Distinct 47
Distinct (%) 15.5%
Missing 0
Missing (%) 0.0%
Memory size 2.5 KiB
FORELAND
70 
RIFT
48 
INTRACRATONIC
28 
PASSIVE MARGIN
16 
INVERSION/RIFT
15 
Other values (42)
127 

Length

Max length 29
Median length 25
Mean length 11.368421
Min length 4

Characters and Unicode

Total characters 3456
Distinct characters 24
Distinct categories 4 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 15 ?
Unique (%) 4.9%

Sample

1st row INVERSION/WRENCH
2nd row SALT/PASSIVE MARGIN
3rd row PASSIVE MARGIN
4th row THRUST
5th row INVERSION/RIFT

Common Values

Value Count Frequency (%)
FORELAND 70
23.0%
RIFT 48
15.8%
INTRACRATONIC 28
 
9.2%
PASSIVE MARGIN 16
 
5.3%
INVERSION/RIFT 15
 
4.9%
THRUST 14
 
4.6%
SALT/FORELAND 13
 
4.3%
SALT/PASSIVE MARGIN 11
 
3.6%
DELTA/PASSIVE MARGIN 6
 
2.0%
INVERSION/BACKARC 5
 
1.6%
Other values (37) 78
25.7%

Length

2022-12-10T23:08:30.866909 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
foreland 70
20.1%
rift 48
13.8%
margin 45
12.9%
intracratonic 28
 
8.0%
passive 16
 
4.6%
inversion/rift 15
 
4.3%
thrust 14
 
4.0%
salt/foreland 13
 
3.7%
salt/passive 11
 
3.2%
delta/passive 6
 
1.7%
Other values (38) 83
23.8%

Most occurring characters

Value Count Frequency (%)
R 392
11.3%
A 354
10.2%
I 307
 
8.9%
N 303
 
8.8%
T 282
 
8.2%
S 240
 
6.9%
E 226
 
6.5%
F 192
 
5.6%
L 185
 
5.4%
O 176
 
5.1%
Other values (14) 799
23.1%

Most occurring categories

Value Count Frequency (%)
Uppercase Letter 3254
94.2%
Other Punctuation 136
 
3.9%
Space Separator 45
 
1.3%
Dash Punctuation 21
 
0.6%

Most frequent character per category

Uppercase Letter
Value Count Frequency (%)
R 392
12.0%
A 354
10.9%
I 307
9.4%
N 303
9.3%
T 282
8.7%
S 240
7.4%
E 226
 
6.9%
F 192
 
5.9%
L 185
 
5.7%
O 176
 
5.4%
Other values (11) 597
18.3%
Other Punctuation
Value Count Frequency (%)
/ 136
100.0%
Space Separator
Value Count Frequency (%)
45
100.0%
Dash Punctuation
Value Count Frequency (%)
- 21
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 3254
94.2%
Common 202
 
5.8%

Most frequent character per script

Latin
Value Count Frequency (%)
R 392
12.0%
A 354
10.9%
I 307
9.4%
N 303
9.3%
T 282
8.7%
S 240
7.4%
E 226
 
6.9%
F 192
 
5.9%
L 185
 
5.7%
O 176
 
5.4%
Other values (11) 597
18.3%
Common
Value Count Frequency (%)
/ 136
67.3%
45
 
22.3%
- 21
 
10.4%

Most occurring blocks

Value Count Frequency (%)
ASCII 3456
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
R 392
11.3%
A 354
10.2%
I 307
 
8.9%
N 303
 
8.8%
T 282
 
8.2%
S 240
 
6.9%
E 226
 
6.5%
F 192
 
5.6%
L 185
 
5.4%
O 176
 
5.1%
Other values (14) 799
23.1%

Depth
Real number (ℝ)

Distinct 276
Distinct (%) 90.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 6781.0954
Minimum 220
Maximum 18050
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 2.5 KiB
2022-12-10T23:08:31.138529 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum 220
5-th percentile 1844.75
Q1 4000
median 6680.5
Q3 9412.5
95-th percentile 12962.5
Maximum 18050
Range 17830
Interquartile range (IQR) 5412.5

Descriptive statistics

Standard deviation 3499.0348
Coefficient of variation (CV) 0.51599846
Kurtosis -0.24839987
Mean 6781.0954
Median Absolute Deviation (MAD) 2700
Skewness 0.42939405
Sum 2061453
Variance 12243244
Monotonicity Not monotonic
2022-12-10T23:08:31.415887 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
2950 4
 
1.3%
8200 3
 
1.0%
4400 3
 
1.0%
4000 3
 
1.0%
7044 2
 
0.7%
2310 2
 
0.7%
5500 2
 
0.7%
10500 2
 
0.7%
4300 2
 
0.7%
5800 2
 
0.7%
Other values (266) 279
91.8%
Value Count Frequency (%)
220 1
0.3%
480 1
0.3%
490 1
0.3%
500 1
0.3%
600 1
0.3%
758 1
0.3%
984 1
0.3%
1000 1
0.3%
1030 1
0.3%
1104 1
0.3%
Value Count Frequency (%)
18050 1
0.3%
16360 1
0.3%
15460 1
0.3%
15420 1
0.3%
15295 1
0.3%
15250 1
0.3%
14700 1
0.3%
14600 1
0.3%
14500 1
0.3%
14231 1
0.3%

Period
Categorical

Distinct 22
Distinct (%) 7.2%
Missing 0
Missing (%) 0.0%
Memory size 2.5 KiB
CRETACEOUS
82 
NEOGENE
42 
JURASSIC
41 
PALEOGENE
34 
CARBONIFEROUS
25 
Other values (17)
80 

Length

Max length 24
Median length 20.5
Mean length 9.75
Min length 7

Characters and Unicode

Total characters 2964
Distinct characters 22
Distinct categories 2 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 7 ?
Unique (%) 2.3%

Sample

1st row NEOGENE
2nd row CRETACEOUS
3rd row CRETACEOUS
4th row CRETACEOUS
5th row CRETACEOUS

Common Values

Value Count Frequency (%)
CRETACEOUS 82
27.0%
NEOGENE 42
13.8%
JURASSIC 41
13.5%
PALEOGENE 34
11.2%
CARBONIFEROUS 25
 
8.2%
PERMIAN 22
 
7.2%
DEVONIAN 16
 
5.3%
TRIASSIC 9
 
3.0%
CRETACEOUS-PALEOGENE 8
 
2.6%
PROTEROZOIC 5
 
1.6%
Other values (12) 20
 
6.6%

Length

2022-12-10T23:08:32.196597 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
cretaceous 82
27.0%
neogene 42
13.8%
jurassic 41
13.5%
paleogene 34
11.2%
carboniferous 25
 
8.2%
permian 22
 
7.2%
devonian 16
 
5.3%
triassic 9
 
3.0%
cretaceous-paleogene 8
 
2.6%
proterozoic 5
 
1.6%
Other values (12) 20
 
6.6%

Most occurring characters

Value Count Frequency (%)
E 537
18.1%
C 287
9.7%
O 287
9.7%
A 278
9.4%
R 254
8.6%
S 236
8.0%
N 230
7.8%
U 168
 
5.7%
I 156
 
5.3%
T 110
 
3.7%
Other values (12) 421
14.2%

Most occurring categories

Value Count Frequency (%)
Uppercase Letter 2942
99.3%
Dash Punctuation 22
 
0.7%

Most frequent character per category

Uppercase Letter
Value Count Frequency (%)
E 537
18.3%
C 287
9.8%
O 287
9.8%
A 278
9.4%
R 254
8.6%
S 236
8.0%
N 230
7.8%
U 168
 
5.7%
I 156
 
5.3%
T 110
 
3.7%
Other values (11) 399
13.6%
Dash Punctuation
Value Count Frequency (%)
- 22
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 2942
99.3%
Common 22
 
0.7%

Most frequent character per script

Latin
Value Count Frequency (%)
E 537
18.3%
C 287
9.8%
O 287
9.8%
A 278
9.4%
R 254
8.6%
S 236
8.0%
N 230
7.8%
U 168
 
5.7%
I 156
 
5.3%
T 110
 
3.7%
Other values (11) 399
13.6%
Common
Value Count Frequency (%)
- 22
100.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 2964
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
E 537
18.1%
C 287
9.7%
O 287
9.7%
A 278
9.4%
R 254
8.6%
S 236
8.0%
N 230
7.8%
U 168
 
5.7%
I 156
 
5.3%
T 110
 
3.7%
Other values (12) 421
14.2%

Lithology
Categorical

Distinct 16
Distinct (%) 5.3%
Missing 0
Missing (%) 0.0%
Memory size 2.5 KiB
SANDSTONE
178 
LIMESTONE
42 
DOLOMITE
37 
LOW-RESISTIVITY SANDSTONE
 
9
CONGLOMERATE
 
7
Other values (11)
31 

Length

Max length 25
Median length 9
Mean length 9.7467105
Min length 5

Characters and Unicode

Total characters 2963
Distinct characters 21
Distinct categories 3 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 4 ?
Unique (%) 1.3%

Sample

1st row SANDSTONE
2nd row LIMESTONE
3rd row LIMESTONE
4th row SANDSTONE
5th row SANDSTONE

Common Values

Value Count Frequency (%)
SANDSTONE 178
58.6%
LIMESTONE 42
 
13.8%
DOLOMITE 37
 
12.2%
LOW-RESISTIVITY SANDSTONE 9
 
3.0%
CONGLOMERATE 7
 
2.3%
CHALK 7
 
2.3%
CHALKY LIMESTONE 6
 
2.0%
THINLY-BEDDED SANDSTONE 4
 
1.3%
SHALY SANDSTONE 3
 
1.0%
SILTSTONE 3
 
1.0%
Other values (6) 8
 
2.6%

Length

2022-12-10T23:08:32.884297 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
sandstone 194
59.1%
limestone 50
 
15.2%
dolomite 37
 
11.3%
low-resistivity 9
 
2.7%
conglomerate 7
 
2.1%
chalk 7
 
2.1%
chalky 6
 
1.8%
thinly-bedded 4
 
1.2%
shaly 3
 
0.9%
siltstone 3
 
0.9%
Other values (6) 8
 
2.4%

Most occurring characters

Value Count Frequency (%)
S 469
15.8%
N 455
15.4%
E 370
12.5%
O 351
11.8%
T 322
10.9%
D 246
8.3%
A 222
7.5%
L 131
 
4.4%
I 129
 
4.4%
M 98
 
3.3%
Other values (11) 170
 
5.7%

Most occurring categories

Value Count Frequency (%)
Uppercase Letter 2926
98.8%
Space Separator 24
 
0.8%
Dash Punctuation 13
 
0.4%

Most frequent character per category

Uppercase Letter
Value Count Frequency (%)
S 469
16.0%
N 455
15.6%
E 370
12.6%
O 351
12.0%
T 322
11.0%
D 246
8.4%
A 222
7.6%
L 131
 
4.5%
I 129
 
4.4%
M 98
 
3.3%
Other values (9) 133
 
4.5%
Space Separator
Value Count Frequency (%)
24
100.0%
Dash Punctuation
Value Count Frequency (%)
- 13
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 2926
98.8%
Common 37
 
1.2%

Most frequent character per script

Latin
Value Count Frequency (%)
S 469
16.0%
N 455
15.6%
E 370
12.6%
O 351
12.0%
T 322
11.0%
D 246
8.4%
A 222
7.6%
L 131
 
4.5%
I 129
 
4.4%
M 98
 
3.3%
Other values (9) 133
 
4.5%
Common
Value Count Frequency (%)
24
64.9%
- 13
35.1%

Most occurring blocks

Value Count Frequency (%)
ASCII 2963
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
S 469
15.8%
N 455
15.4%
E 370
12.5%
O 351
11.8%
T 322
10.9%
D 246
8.3%
A 222
7.5%
L 131
 
4.4%
I 129
 
4.4%
M 98
 
3.3%
Other values (11) 170
 
5.7%

Gross
Real number (ℝ)

Distinct 176
Distinct (%) 57.9%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 799.45066
Minimum 11
Maximum 10500
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 2.5 KiB
2022-12-10T23:08:33.621499 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum 11
5-th percentile 60
Q1 143.75
median 350
Q3 792.5
95-th percentile 3127.5
Maximum 10500
Range 10489
Interquartile range (IQR) 648.75

Descriptive statistics

Standard deviation 1339.5467
Coefficient of variation (CV) 1.675584
Kurtosis 17.079529
Mean 799.45066
Median Absolute Deviation (MAD) 250
Skewness 3.7767336
Sum 243033
Variance 1794385.5
Monotonicity Not monotonic
2022-12-10T23:08:34.553230 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
300 11
 
3.6%
100 10
 
3.3%
250 8
 
2.6%
80 7
 
2.3%
115 6
 
2.0%
120 6
 
2.0%
400 6
 
2.0%
350 6
 
2.0%
200 5
 
1.6%
500 5
 
1.6%
Other values (166) 234
77.0%
Value Count Frequency (%)
11 1
0.3%
20 1
0.3%
30 1
0.3%
35 1
0.3%
40 1
0.3%
43 1
0.3%
45 1
0.3%
46 2
0.7%
47 2
0.7%
50 1
0.3%
Value Count Frequency (%)
10500 1
0.3%
8000 1
0.3%
7500 1
0.3%
6739 1
0.3%
6562 1
0.3%
5900 1
0.3%
5800 1
0.3%
5350 1
0.3%
5335 1
0.3%
5249 1
0.3%

Netpay
Real number (ℝ)

Distinct 175
Distinct (%) 57.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 204.24447
Minimum 0
Maximum 2976
Zeros 1
Zeros (%) 0.3%
Negative 0
Negative (%) 0.0%
Memory size 2.5 KiB
2022-12-10T23:08:35.021914 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 15
Q1 46
median 115
Q3 242.5
95-th percentile 650
Maximum 2976
Range 2976
Interquartile range (IQR) 196.5

Descriptive statistics

Standard deviation 287.24819
Coefficient of variation (CV) 1.4063939
Kurtosis 32.051444
Mean 204.24447
Median Absolute Deviation (MAD) 81.5
Skewness 4.4743082
Sum 62090.32
Variance 82511.52
Monotonicity Not monotonic
2022-12-10T23:08:35.515378 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
150 9
 
3.0%
100 9
 
3.0%
200 8
 
2.6%
120 7
 
2.3%
50 6
 
2.0%
33 6
 
2.0%
15 5
 
1.6%
20 5
 
1.6%
35 5
 
1.6%
160 4
 
1.3%
Other values (165) 240
78.9%
Value Count Frequency (%)
0 1
 
0.3%
2.12 1
 
0.3%
4 1
 
0.3%
7 2
 
0.7%
10 1
 
0.3%
12 2
 
0.7%
13 2
 
0.7%
14 2
 
0.7%
15 5
1.6%
15.5 1
 
0.3%
Value Count Frequency (%)
2976 1
0.3%
1600 1
0.3%
1500 1
0.3%
1466 1
0.3%
1000 1
0.3%
984 1
0.3%
928 1
0.3%
902 1
0.3%
900 2
0.7%
850 1
0.3%

Porosity
Real number (ℝ)

Distinct 71
Distinct (%) 23.4%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 17.810132
Minimum 1.3
Maximum 55
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 2.5 KiB
2022-12-10T23:08:36.035311 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum 1.3
5-th percentile 6.015
Q1 12
median 17
Q3 23.7
95-th percentile 30
Maximum 55
Range 53.7
Interquartile range (IQR) 11.7

Descriptive statistics

Standard deviation 7.5552406
Coefficient of variation (CV) 0.42421026
Kurtosis 0.92500689
Mean 17.810132
Median Absolute Deviation (MAD) 6
Skewness 0.51622232
Sum 5414.28
Variance 57.081661
Monotonicity Not monotonic
2022-12-10T23:08:36.584311 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
25 18
 
5.9%
20 17
 
5.6%
10 16
 
5.3%
14 15
 
4.9%
11 14
 
4.6%
16 14
 
4.6%
17 13
 
4.3%
15 13
 
4.3%
18 13
 
4.3%
12 11
 
3.6%
Other values (61) 160
52.6%
Value Count Frequency (%)
1.3 1
 
0.3%
1.8 1
 
0.3%
3.2 1
 
0.3%
3.8 1
 
0.3%
4 3
1.0%
5 2
0.7%
5.4 1
 
0.3%
5.5 1
 
0.3%
5.6 1
 
0.3%
6 4
1.3%
Value Count Frequency (%)
55 1
 
0.3%
35 2
 
0.7%
34 1
 
0.3%
33.8 1
 
0.3%
33 1
 
0.3%
32 5
1.6%
31 1
 
0.3%
30 7
2.3%
29 3
1.0%
28.5 2
 
0.7%

Permeability
Real number (ℝ)

Distinct 149
Distinct (%) 49.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 457.47368
Minimum 0.01
Maximum 7500
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 2.5 KiB
2022-12-10T23:08:38.001545 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum 0.01
5-th percentile 0.5
Q1 10
median 86.65
Q3 427.5
95-th percentile 2000
Maximum 7500
Range 7499.99
Interquartile range (IQR) 417.5

Descriptive statistics

Standard deviation 982.81055
Coefficient of variation (CV) 2.1483434
Kurtosis 25.250821
Mean 457.47368
Median Absolute Deviation (MAD) 85.65
Skewness 4.4636298
Sum 139072
Variance 965916.58
Monotonicity Not monotonic
2022-12-10T23:08:38.559722 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
100 12
 
3.9%
20 11
 
3.6%
1000 10
 
3.3%
5 9
 
3.0%
2000 9
 
3.0%
10 8
 
2.6%
30 8
 
2.6%
500 8
 
2.6%
0.5 7
 
2.3%
200 6
 
2.0%
Other values (139) 216
71.1%
Value Count Frequency (%)
0.01 1
 
0.3%
0.04 1
 
0.3%
0.09 1
 
0.3%
0.1 4
1.3%
0.2 1
 
0.3%
0.3 1
 
0.3%
0.4 2
 
0.7%
0.5 7
2.3%
0.55 1
 
0.3%
0.6 3
1.0%
Value Count Frequency (%)
7500 2
 
0.7%
7000 1
 
0.3%
5000 2
 
0.7%
3000 2
 
0.7%
2529 1
 
0.3%
2500 1
 
0.3%
2460 1
 
0.3%
2250 1
 
0.3%
2098 1
 
0.3%
2000 9
3.0%

Interactions

2022-12-10T23:08:26.283398 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:21.200178 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:22.627854 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:24.222313 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:25.268054 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:26.503229 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:21.431124 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:22.836197 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:24.441510 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:25.475224 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:26.708259 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:21.687154 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:23.037815 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:24.646084 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:25.667260 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:26.922478 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:21.952917 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:23.826596 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:24.857528 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:25.866934 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:27.147922 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:22.296984 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:24.006287 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:25.058956 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-12-10T23:08:26.071748 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Correlations

2022-12-10T23:08:39.419004 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2022-12-10T23:08:40.074289 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-12-10T23:08:40.783611 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-12-10T23:08:41.315058 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-12-10T23:08:42.922105 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-12-10T23:08:43.904362 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-12-10T23:08:28.251015 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-12-10T23:08:28.742690 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Tectonic_regime Onshore/Offshore Hydrocarbon_type Reservoir_status Structural_setting Depth Period Lithology Gross Netpay Porosity Permeability
0 STRIKE-SLIP/TRANSPRESSION/BASEMENT-I 0 OIL DEVELOPING INVERSION/WRENCH 3520 NEOGENE SANDSTONE 2460.0 220.0 20.0 45.0
1 GRAVITY/EXTENSION/EVAPORITE 0 OIL MATURE PRODUCTION SALT/PASSIVE MARGIN 9967 CRETACEOUS LIMESTONE 427.0 160.0 19.0 175.0
2 GRAVITY/EXTENSION/EVAPORITE 1 OIL MATURE PRODUCTION PASSIVE MARGIN 8700 CRETACEOUS LIMESTONE 95.0 15.0 12.0 20.0
3 COMPRESSION 1 OIL DECLINING PRODUCTION THRUST 5084 CRETACEOUS SANDSTONE 328.0 300.0 13.0 600.0
4 INVERSION/COMPRESSION/EXTENSION 1 OIL DECLINING PRODUCTION INVERSION/RIFT 1030 CRETACEOUS SANDSTONE 260.0 33.0 24.0 182.0
5 COMPRESSION/EXTENSION/EVAPORITE 1 OIL DECLINING PRODUCTION INTRACRATONIC 5575 CARBONIFEROUS DOLOMITE 80.0 46.0 14.0 15.0
6 INVERSION/COMPRESSION/EXTENSION 1 OIL DEVELOPING INVERSION/RIFT 5216 PROTEROZOIC SANDSTONE 200.0 25.0 10.0 209.0
7 INVERSION/COMPRESSION/EXTENSION 0 OIL DEVELOPING RIFT 8100 CRETACEOUS SANDSTONE 115.0 40.0 15.0 30.0
8 COMPRESSION 1 OIL DECLINING PRODUCTION INTRACRATONIC 1915 CARBONIFEROUS LIMESTONE 330.0 20.0 10.0 35.0
9 COMPRESSION 1 OIL DECLINING PRODUCTION FORELAND 4150 PERMIAN DOLOMITE 225.0 200.0 17.5 62.0
Tectonic_regime Onshore/Offshore Hydrocarbon_type Reservoir_status Structural_setting Depth Period Lithology Gross Netpay Porosity Permeability
294 COMPRESSION/EROSION 1 OIL REJUVENATING SUB-THRUST 6970 TRIASSIC CONGLOMERATE 525.0 100.0 12.2 71.2
295 EXTENSION/EVAPORITE/GRAVITY 1 OIL NEARLY DEPLETED RIFT 2421 PALEOGENE SANDSTONE 5900.0 28.0 18.3 386.0
296 INVERSION/COMPRESSION/EXTENSION 1 OIL DECLINING PRODUCTION INVERSION/RIFT 2734 CRETACEOUS THINLY-BEDDED SANDSTONE 600.0 100.0 26.0 1500.0
297 COMPRESSION/EROSION 1 OIL MATURE PRODUCTION FORELAND 3050 CRETACEOUS SANDSTONE 165.0 35.0 25.0 2000.0
298 GRAVITY/EXTENSION/EVAPORITE 0 OIL PLATEAU PRODUCTION SALT/PASSIVE MARGIN 5543 PALEOGENE-NEOGENE SANDSTONE 150.0 150.0 28.5 2000.0
299 GRAVITY/EXTENSION/EVAPORITE/SYNSEDIMENTATION 0 OIL DECLINING PRODUCTION DELTA/SUB-SALT/PASSIVE MARGIN 13265 NEOGENE LOW-RESISTIVITY SANDSTONE 1500.0 295.0 29.0 1500.0
300 INVERSION/COMPRESSION/EXTENSION 0 OIL DECLINING PRODUCTION RIFT/PASSIVE MARGIN 1657 CRETACEOUS LOW-RESISTIVITY SANDSTONE 164.0 98.0 32.0 7500.0
301 COMPRESSION/EVAPORITE 1 OIL CONTINUING DEVELOPMENT FORELAND 10211 CRETACEOUS CHALKY LIMESTONE 328.0 213.0 13.0 0.8
302 INVERSION/COMPRESSION/EXTENSION/EVAPORITE 0 GAS-CONDENSATE PLATEAU PRODUCTION SALT/RIFT 16360 JURASSIC SANDSTONE 980.0 490.0 16.0 10.0
303 EXTENSION 1 GAS-CONDENSATE DECLINING PRODUCTION RIFT 11200 CRETACEOUS SANDSTONE 1378.0 446.0 14.0 340.0